DSC 140B

Problem #161

Tags: lecture-15, convolutional neural networks

A grayscale image of size $32 \times 32 \times 1$ is convolved with a filter of size $5 \times 5$. No padding is applied, and the stride is 1. What is the shape of the output response map?

Solution

$28 \times 28 \times 1$.

With no padding and stride 1, the output height and width are each $32 - 5 + 1 = 28$. The filter slides over each $5 \times 5$ block of the image from left to right and top to bottom, producing one output value per position.

Problem #162

Tags: lecture-15, convolutional neural networks

An input $5 \times 5$ grayscale image $(I)$ is represented by the matrix below.

\[ I = \begin{pmatrix} 0.2 & 0.1 & 0.4 & 0 & 0.3 \\ 0 & 0.5 & 0.2 & 0.7 & 0 \\ 0.3 & 0 & 0.6 & 0.1 & 0.5 \\ 0.1 & 0.4 & 0 & 0.3 & 0.2 \\ 0 & 0.2 & 0.5 & 0 & 0.4 \end{pmatrix}\]

Suppose you convolve $I$ with the $3 \times 3$ filter

\[ F = \begin{pmatrix} 1 & 0 & -1 \\ 0 & 1 & 0 \\ -1 & 0 & 1 \end{pmatrix}\]

to get the response map $I'$(with stride 1 and no padding). What is the value of $I'_{11}$ -- the entry in the 1st row and 1st column of the output?

Solution

$I'_{11} = 0.6$.

The $3 \times 3$ patch at the top-left corner of $I$ is:

\[\begin{pmatrix} 0.2 & 0.1 & 0.4 \\ 0 & 0.5 & 0.2 \\ 0.3 & 0 & 0.6 \end{pmatrix}\]

Applying the filter element-wise and summing:

$$\begin{align*} I'_{11}&= 0.2 \cdot 1 + 0.1 \cdot 0 + 0.4 \cdot(-1) + 0 \cdot 0 + 0.5 \cdot 1 + 0.2 \cdot 0 \\&\quad + 0.3 \cdot(-1) + 0 \cdot 0 + 0.6 \cdot 1 \\&= 0.2 - 0.4 + 0.5 - 0.3 + 0.6 \\&= 0.6 \end{align*}$$

Problem #163

Tags: lecture-15, convolutional neural networks

An input image has shape $80 \times 80 \times 11$, where $11$ is the number of channels. We wish to convolve this image with a 3D filter of shape $5 \times 5 \times k$. What must the value of $k$ be for the convolution to work?

Solution

$k = 11$.

A 3D convolution filter must have the same number of channels as the input. The filter slides spatially across the height and width of the image, but at each position it computes a dot product across all channels. Therefore the filter's third dimension must match the input's channel count.

Problem #164

Tags: lecture-15, convolutional neural networks

Consider a convolutional neural network with the following architecture. The input is a $10 \times 10 \times 1$ grayscale image. It passes through Conv layer 1 (3 filters of size $3 \times 3$, stride 1, no padding), producing an output of shape $8 \times 8 \times 3$. Then $2 \times 2$ max pooling is applied, producing an output of shape $4 \times 4 \times 3$. Next is Conv layer 2 (5 filters of size $3 \times 3 \times 3$, stride 1, no padding), producing an output of shape $2 \times 2 \times 5$. This is flattened and fed into a fully connected layer with $n$ nodes, followed by an output layer with 1 node.

Part 1)

What is the value of $n$?

Solution

$n = 20$.

The output of Conv layer 2 is $2 \times 2 \times 5$. Flattening this gives $2 \times 2 \times 5 = 20$ values, so the fully connected layer has $20$ nodes.

Part 2)

What is the total number of learnable parameters in the network, excluding biases?

Solution

$182$.

Conv layer 1 has 3 filters of shape $3 \times 3$, each with $9$ weights, for $3 \times 9 = 27$ parameters. Conv layer 2 has 5 filters of shape $3 \times 3 \times 3$, each with $27$ weights, for $5 \times 27 = 135$ parameters. The fully connected layer connects to the output: $20 \times 1 = 20$ parameters. The grand total is $27 + 135 + 20 = 182$.

Note that max pooling has no learnable parameters.

Problem #165

Tags: lecture-15, convolutional neural networks

An input $4 \times 4$ grayscale image $(I)$ is represented by the matrix below.

\[ I = \begin{pmatrix} 0.7 & 0.2 & 0.1 & 0.8 \\ 0.3 & 0.5 & 0.4 & 0.2 \\ 0.6 & 0.1 & 0.9 & 0.3 \\ 0.2 & 0.8 & 0.5 & 0.6 \end{pmatrix}\]

$2 \times 2$ max pooling is applied to this image. What is the resulting output?

Solution

$\begin{pmatrix} 0.7 & 0.8 \\ 0.8 & 0.9 \end{pmatrix}$.

With $2 \times 2$ max pooling, we divide the $4 \times 4$ image into four non-overlapping $2 \times 2$ blocks and take the maximum of each. Top-left: $\max\{0.7, 0.2, 0.3, 0.5\} = 0.7$. Top-right: $\max\{0.1, 0.8, 0.4, 0.2\} = 0.8$. Bottom-left: $\max\{0.6, 0.1, 0.2, 0.8\} = 0.8$. Bottom-right: $\max\{0.9, 0.3, 0.5, 0.6\} = 0.9$.

Problems tagged with "convolutional neural networks"

Problem #161

Problem #162

Problem #163

Problem #164

Part 1)

Part 2)

Problem #165